Design and Evaluation of a Pipelined Distributed Information Retrieval Architecture
نویسندگان
چکیده
Web-scale search engines deal with a volume of data and queries that forces them to make use of an index partitioned across many machines. Two main methods of partitioning an index for distributed processing have been described in the literature. In document partitioning, each processor node holds the information for a subset of documents, while in term partitioning, each node holds the information for a subset of terms. Additionally, a novel architecture, pipelining, has been proposed, offering to combine the best features of both architectures. This thesis develops a careful methodology for the experimental comparison of distributed information retrieval architectures, addressing questions such as experiment scalability and query set generation. Novel methods are proposed for accumulator pruning, and for compression of accumulators for shipping between nodes in the pipelined architecture. A meticulous experimental assessment of the three distributed architectures is then undertaken. The results demonstrate that term distribution suffers a severe processing bottleneck. Pipelining resolves term distribution’s processing bottleneck, while maintaining its superior I/O characteristics. However, pipelining suffers from serious load imbalance between the nodes, fails to fully utilise the cluster’s processing capacity, and scales poorly. Document distribution, in contrast, distributes workload evenly and scales well. Load balancing through the intelligent assignment of terms to partitions is explored, but fails to fully resolve the imbalance of the pipelined architecture. Instead, the partial replication of high-workload terms is proposed, coupled with the intelligent routing of queries. These techniques resolve pipelining’s load imbalance, allowing it to marginally outperform document distribution. The partially-replicated pipelined architecture is shown to benefit from system scale. It also significantly outperforms document distribution in a memorylimited environment, suggesting that it would also outperform for larger collections relative to cluster size. However, unlike document distribution, pipelining’s average response time at low to moderate loads does not scale. The system implementor must therefore weigh the relative benefits of the two architectures.
منابع مشابه
Design and Implementation of Digital Demodulator for Frequency Modulated CW Radar (RESEARCH NOTE)
Radar Signal Processing has been an interesting area of research for realization of programmable digital signal processor using VLSI design techniques. Digital Signal Processing (DSP) algorithms have been an integral design methodology for implementation of high speed application specific real-time systems especially for high resolution radar. CORDIC algorithm, in recent times, is turned out to...
متن کاملEfficient implementation of low time complexity and pipelined bit-parallel polynomial basis multiplier over binary finite fields
This paper presents two efficient implementations of fast and pipelined bit-parallel polynomial basis multipliers over GF (2m) by irreducible pentanomials and trinomials. The architecture of the first multiplier is based on a parallel and independent computation of powers of the polynomial variable. In the second structure only even powers of the polynomial variable are used. The par...
متن کاملPerformance Evaluation of Medical Image Retrieval Systems Based on a Systematic Review of the Current Literature
Background and Aim: Image, as a kind of information vehicle which can convey a large volume of information, is important especially in medicine field. Existence of different attributes of image features and various search algorithms in medical image retrieval systems and lack of an authority to evaluate the quality of retrieval systems, make a systematic review in medical image retrieval system...
متن کاملAn inquiry in historical evolution and retrieval of the process of formation and transformation of Shah Wali complex, Taft, Iran
Abstract This study was carried out to investigate and shed light on the complex theoretical concept of place, as a continuing dynamic phenomenon, in architecture. To this end, it has looked into the historical evolutions and retrieval of the Shah Wali complex in Taft. Considering the topic and the goal of this research paper, the morphological analysis as a tool used in the interpretive-histor...
متن کاملLibraRing: An Architecture for Distributed Digital Libraries Based on DHTs
We present a digital library architecture based on distributed hash tables. We discuss the main components of this architecture and the protocols for offering information retrieval and information filtering functionality. We present an experimental evaluation of our proposals.
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2007